Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: stbalbach Date: Jun 8, 2012 7:55am
Forum: texts Subject: Re: problem viewing full text

Line break codes are invisible.

Try this site

http://www.fileformat.info/convert/text/unix2dos.tr

Convert before uploading. See if that solves.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffandrewbontrager Date: Jun 8, 2012 12:49pm
Forum: texts Subject: Re: problem viewing full text

Thanks for the tip, I tried sending a txt file to the site and came up with a 500 server error, shoot.

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jun 8, 2012 2:30pm
Forum: texts Subject: Re: problem viewing full text

Ok sorry site appears broken. How about converting to another format using this site:

http://document.online-convert.com/

Try any number of formats, such as text, html, pdf, doc -- then upload and see how Internet Archive displays.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffandrewbontrager Date: Jun 8, 2012 8:37pm
Forum: texts Subject: Re: problem viewing full text

Good idea, but I want to keep with the txt format.

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 7, 2012 11:35pm
Forum: texts Subject: Re: problem viewing full text

andrewbontrager,

My Unix computer shows that your file 282.txt is (using the `file' command):

Non-ISO extended-ASCII English text, with CR line terminators

This is highly non-standard. Something is wrong with your file. A www search for the string “Non-ISO extended-ASCII English text” brings: “I realized that the majority of the file is in ISO-8859-1 and some parts in utf-8.…” http://stackoverflow.com/questions/5901633/perl-file-encoding-and-word-comparison

I will soon check your file and offer suggestions.

This post was modified by aibek on 2012-11-08 07:35:18

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 8, 2012 12:23am
Forum: texts Subject: Re: problem viewing full text

You have an invalid UTF-8 character in your file. Line 59 starts with:

What I claim as my improvement is

After that there is a 0x97 (octal \227) followed by two 0x0d (the Mac line termination). This sequence is invalid UTF-8.

The proper way to correct this error is to replace the \227 with the proper character in UTF-8 encoding. The no-brainer way is to put a hyphen there, making the file ASCII. Or you may put an en-dash or an em-dash or a quotation-dash. In all the above cases, the file would be valid UTF-8; plain ASCII is, obviously, valid UTF-8.

Try to find out how you got such an exotic character, so that it does not happen with random files!

---
Output of `isutf8':
282.txt: line 1, char 1, byte offset 2913: invalid UTF-8 code

Valid UTF-8 codes:
– en-dash (U+2013)
— em-dash (U+2014)
― quotation-dash (U+2015)
(You can copy these from this page.) To know which one to use, consult your style guide! Check this, though: https://en.wikipedia.org/wiki/Dash

---

Note that the culprit character may not be visible in normal text editors even though it is there. (Some editors will even refuse to open the file.) Emacs, vim and hex editors show the character.


This post was modified by aibek on 2012-11-08 08:15:36

This post was modified by aibek on 2012-11-08 08:23:53

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 8, 2012 1:41am
Forum: texts Subject: Re: problem viewing full text

I found out how you got the character.

Either (i) your text editor is set to save files in ANSI aka Windows-1252 aka cp1252. In this encoding 0x97 is the value for an em-dash. (See the link.)
Or, (ii) you copied your em-dash from a file in the Windows-1252 encoding and pasted in you text file.

(The effect of both the steps is the same -- your file is effectively in Win-1252 encoding.)

The solution to your problem is simple: convert the file from Win-1252 encoding to UTF-8 encoding. There are tools to do it automatically in Unix (the best is `iconv'). Or do it online (see the link below). Or, if you provide a collection of all the text files to me, I will convert it for you -- it is trivial on my machine.

So that the problem does not happen in future, set your text editor to use UTF-8 encoding. Win-1252 is deprecated. (See link.) Or, if the (ii) above is true, stop using your ‘master’-em-dash (which you copy to the files you edit.)

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

http://kanjidict.stc.cx/recode.php

https://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages


This post was modified by aibek on 2012-11-08 09:41:27

Terms of Use (10 Mar 2001)